Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

نویسندگان

  • Buhm Han
  • Hyun Min Kang
  • Eleazar Eskin
چکیده

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies--SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determining an accurate method to estimate GFR in renal transplant recipients with stable serum creatinine levels

Introduction:Detecting renal allograft dysfunction early will allow timely diagnosis and treatment. There is no objective recommendation by national kidney societies for glomerular filtration rate (eGFR) estimation in post-transplant setting. 99mTc-DTPA Technetium-99m Diethylene triamine penta acetic acid) renogram can identify early renal dysfunction much before ser...

متن کامل

Rapid and robust resampling-based multiple-testing correction with application in a genome-wide expression quantitative trait loci study.

Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. In a typical eQTL study, the huge number of genetic markers and expression traits and their complicated correlations present a challenging multiple-testing correction problem. The resampling-based test using permutation or bootstrap...

متن کامل

So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests.

Contemporary genetic association studies may test hundreds of thousands of genetic variants for association, often with multiple binary and continuous traits or under more than one model of inheritance. Many of these association tests may be correlated with one another because of linkage disequilibrium between nearby markers and correlation between traits and models. Permutation tests and simul...

متن کامل

The Impact of Correction for Guessing Formula on MC and Yes/No Vocabulary Tests' Scores

A standard correction for random guessing (cfg) formula on multiple-choice and Yes/Noexaminations was examined retrospectively in the scores of the intermediate female EFL learners in an English language school. The correctionwas a weighting formula for points awarded for correct answers,incorrect answers, and unanswered questions so that the expectedvalue of the increase in test score due to g...

متن کامل

A Nonlinear Model of Economic Data Related to the German Automobile Industry

Prediction of economic variables is a basic component not only for economic models, but also for many business decisions. But it is difficult to produce accurate predictions in times of economic crises, which cause nonlinear effects in the data. Such evidence appeared in the German automobile industry as a consequence of the financial crisis in 2008/09, which influenced exchange rates and a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PLoS Genetics

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2009